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NONLINEAR  MARKOV  CONTROL  PROCESSES 
AND  GAMES:  FINAL  REPORT 

Vassili  N.  Kolokoltsov 


Abstract 

The  project  was  devoted  to  the  analysis  of  a  new  class  of  stochastic  games  that  I  called 
nonlinear  Markov  games,  as  they  arise  as  a  (competitive)  controlled  version  of  nonlinear 
Markov  processes,  which  can  be  roughly  characterized  by  the  property  that  the  future 
depends  on  the  past  not  only  via  the  present  position  (as  in  usual  Markov  processes), 
but  also  via  its  distribution.  This  class  of  games  can  model  a  variety  of  situations  for 
economics  and  epidemics,  statistical  physics  and  pursuit  -  evasion  processes. 

Nonlinear  Markov  games  can  be  considered  as  a  systematic  tool  for  modeling  deception. 
In  particular,  in  a  game  of  pursuit  -  evasion,  an  evading  object  can  create  false  objectives 
or  hide  in  order  to  deceive  the  pursuit.  Thus,  observing  this  object  leads  not  to  its  precise 
location,  but  to  its  distribution  only,  implying  that  it  is  necessary  to  build  competitive 
control  on  the  basis  of  the  distribution  of  the  present  state.  Moreover,  by  observing 
the  action  of  the  evading  objects,  one  can  make  conclusions  about  its  certain  dynamic 
characteristics  making  the  (predicted)  transition  probabilities  depending  on  the  observed 
distribution  via  these  characteristics.  This  is  precisely  the  type  of  situations  modeled  by 
nonlinear  Markov  games. 

Another  key  motivation  arises  from  the  steady  increase  in  complexity  of  the  modern 
technological  development  requires  an  appropriate  (or  better  optimal)  management  of 
complex  stochastic  systems  consisting  of  large  number  of  interacting  components  (agents, 
mechanisms,  vehicles,  subsidiaries,  species,  police  units,  etc)  ,  which  may  have  competitive 
or  common  interests.  Carrying  out  a  traditional  Markov  decision  analysis  for  a  large  state 
space  is  often  unfeasible.  However,  under  rather  general  assumptions,  the  limiting  problem 
as  the  number  of  components  tends  to  infinity  can  be  described  by  a  well  manageable 
nonlinear  deterministic  evolution  on  measures,  and  its  controlled  version  is  given  precisely 
by  a  nonlinear  Markov  control  process  or  (in  case  of  competitive  interests)  a  nonlinear 
Markov  game  that  we  are  investigating. 

The  results  of  the  project  concern  the  fundamental  mathematical  questions  of  the 
theory  of  nonlinear  Markov  control  processes  and  games  like  well  posedness  and  control¬ 
lability,  as  well  as  more  applied  issues  such  as  convergence  of  approximating  schemes.  The 
latter  are  linked  with  interacting  particle  approximations,  as  introduced  above. 


1  Objectives  for  each  grant  year 

The  overall  aim  of  the  project  was  to  address  both  the  fundamental  questions  of  the  theory  of 
nonlinear  Markov  control  processes  and  games  like  well  posedness  and  controllability,  and  the 
more  applied  issues  such  as  approximation  and  numeric  schemes. 
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Simple  illustrative  examples  to  have  in  mind  were:  (1)  Pursuit  -  evasion:  the  evader  produces 
false  targets,  so  that  the  control  of  the  pursuer  should  be  based  not  on  the  observed  position 
of  the  evader,  but  only  on  the  observed  distribution  of  this  position;  (2)  Finances:  observing 
performance  of  the  competitor-company  allows  one  to  make  a  conclusion  on  the  distribution  of 
certain  hidden  internal  parameters  of  this  company,  and  hence  to  make  decisions  based  on  this 
distribution.  (3)  Similarly,  the  traces  of  the  actions  of  terrorists  or  other  organize  crime  groups 
can  be  used  to  assess  the  probability  distribution  of  their  actual  states  (physical  locations, 
amount  of  equipment  available,  etc.),  which  again  leads  to  the  problem  of  control  on  the  basis 
of  the  knowledge  of  the  probability  law  on  the  state  space  thus  relating  nonlinear  Markov 
control  processes  to  the  methods  of  crime  (say,  terrorist  attacks)  prevention. 

Another  crucial  point  for  modern  modeling  in  finance  or  inspection  -  crime  prevention 
measures  is  in  making  decision  on  the  basis  of  certain  risk  characteristics  like  variance  or  VaR 
(Value  at  Risk),  which  represent  functions  of  the  whole  distribution,  and  not  only  on  the 
position  of  a  process  at  a  given  time. 

Let  us  state  the  objectives  for  each  year. 

Tasks  for  Year  1. 

Before  plunging  seriously  into  the  control  setting,  the  analysis  of  nonlinear  Marov  pro¬ 
cesses  themselves  was  to  be  developed  starting  with  the  simplest  classes  such  as  nonlinear 
Levy  processes  and  nonlinear  Markov  chains.  The  analysis  had  to  include  basic  constructions, 
well-posedness  issues,  qualitative  behavior  and  approximating  schemes. 

To  pave  the  way  for  possibly  wider  applications,  the  links  with  concrete  problems  of  natu¬ 
ral  science  should  be  explicitly  established  including  the  models  of  non-equilibrium  statistical 
mechanics,  the  replicator  dynamics  of  multi  agent  evolutionary  games  (evolutionary  biology), 
relevant  models  of  financial  dynamics  and  disease  spreading. 

Tasks  for  Year  2. 

The  main  core  of  the  research  proposed  is  in  the  developing  of  the  theory  of  nonlinear 
Markov  processes  and  their  controlled  versions  including  competitive  control.  Initiated  in  year 
1  mostly  on  the  level  of  discrete  models,  this  task  had  to  be  fully  completed  in  Year  2. 

Tasks  for  year  3. 

The  main  problem  is  in  linking  the  theoretical  construction  of  nonlinear  Markov  processes 
with  controlled  system  of  interacting  particles  bringing  discrete  approximation  with  algorithmic 
methods  of  numeric  calculations  and  more  concrete  applied  models,  like  decision  making  or 
controlling  large  robot  swamps  or  large  armies.  Thus  we  have  to  establish  a  rigorous  link 
with  two-sided  applicability.  Firstly,  in  order  to  be  able  to  apply  the  theoretic  results  to 
concrete  models  of  practical  interest,  the  numeric  schemes  for  the  solutions  are  to  be  developed 
together  with  appropriate  estimates  for  error  terms.  The  most  natural  approximation  and 
related  algorithms  are  based  on  the  approximations  by  systems  of  a  large  number  of  interacting 
particles.  On  the  other  hand,  solving  limiting  nonlinear  control  Markov  process  can  lead  to  a 
useful  qualitative  and  quantitative  asymptotics  to  the  system  of  interacting  particles. 

As  motivation  for  further  research  we  indicated  possible  extensions  to  state  spaces  with 
nontrivial  geometry,  to  the  controlled  nonlinear  quantum  dynamic  semigroups  and  related 
nonlinear  quantum  Markov  processes,  as  well  as  to  the  full  infinite- dimensional  measure  valued 
control  Markov  processes  and  games. 
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2  Findings  for  each  objective 

2.1  Findings  for  year  1 

The  large  part  of  the  work  for  Year  1  was  devoted  to  discrete  models. 

We  have  summarized  the  nonlinear  analogues  of  the  basic  theory  of  usual  Markov  chains, 
where  measures  (on  a  finite  state  space)  are  described  by  a  finite-dimensional  simplex.  A  dis¬ 
crete  space  nonlinear  Markov  semigroup  is  a  one-parameter  semigroup  of  (possibly  nonlinear) 
transformations  of  the  unit  simplex  in  n-dimensional  Euclidean  space  (which  represents  the  set 
of  probability  laws  in  a  finite  set  of  n  points.  In  stochastic  representation  these  transformations 
are  given  by  stochastic  matrices  (as  for  usual  Markov  chains)  depending  on  a  position  (non¬ 
linearity!),  whose  elements  specify  nonlinear  transition  probabilities.  Our  first  result  yields  the 
nonlinear  analog  of  the  basic  convergence  to  a  stationary  regime  from  the  theory  of  Markov 
chains,  basic  conditions  being  certain  mixing  property  of  nonlinear  transition  probabilities. 
In  case  of  the  semigroup  parametrized  by  continuous  time  one  defines  its  generator  as  the 
derivative  of  the  semigroup  at  time  zero.  Stochastic  representation  for  the  generator  means  its 
representation  by  a  Q-matrix  (or  infinitesimally  stochastic  matrix)  again  depending  on  a  po¬ 
sition.  Examples  are  numerous:  replicator  dynamics,  Lottka-Volterra  model,  basic  epidemics, 
see  [1]. 

For  the  corresponding  control  process  we  obtain  nonlinear  analogs  of  the  basic  long  time 
behavior  result,  showing  the  existence  of  the  limiting  average  income  per  unit  of  time  and  of 
the  stationary  strategies  (turnpikes),  see  [1].  Related  results  were  later  developed  in  [9]  on  a 
somewhat  more  systematic  and  general  grounds  that  are  not  yet  fully  exploited  in  the  nonlinear 
case. 

Let  us  point  out  the  (not  so  obvious)  place  of  the  usual  stochastic  control  theory  in  this 
nonlinear  setting.  Namely,  even  assuming  that  the  transition  probabilities  do  not  depend  on 
the  distribution,  does  not  reduce  the  problem  to  the  usual  stochastic  control  setting,  but  to 
a  game  with  incomplete  information,  where  the  states  are  probability  laws.  That  is,  when 
choosing  a  move  the  players  do  not  know  the  position  precisely,  but  only  its  distribution. 

The  analysis  of  nonlinear  Markov  processes  was  systematically  developed  from  two  compli¬ 
mentary  points  of  view:  (i)  analytic,  based  on  functional  analytic  technique  of  semigroups  and 
operators,  where  the  main  object  was  the  nonlinear  kinetic  equation  in  the  weak  form  of  the 
type 

jt(f ,  fh)  =  (A*/ >  /R)  (!) 

for  the  flow  of  Borcl  measures  Ht  in  RJ,  with  a  family  of  pseudo-differential  generators  of  Markov 
processes  of  the  type 


L„f(x)  =  ^(G(x,/x)V,  V)/(x)  +  (■ b(x,n),Vf(x )) 

+  j  (/0  +  y)~  fix)  -  (V/(x),  y))v(x,  n;  dy ), 

see  [1] ,  [4] ;  and  (ii)  probabilistic,  based  on  the  related  to  (1)  differential  equations  driven  by 
nonlinear  Levy  noise: 

dX(t)  =  dYt(X(t),£(X(t)))  (2) 
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in  Rc/  {C{X)  denotes  the  probability  law  of  the  random  variable  A"),  where  Yt(z,  r])  is  a  family 
of  Levy  processes  specified  by  the  Levy-Khinchine  generators 

L[z,v\f(x)  =  *( G(z,ri)\7,\7)f(x )  +  (6(2, 77),  V/(x)) 

+  J {f(x  +  y)~  f(x)  -  (V/(x), y))v{z,  1 7;  dj/), 

depending  on  a  point  2:  and  a  probability  measure  77  in  RcZ  as  on  parameters,  see  [1] ,  [3] ,  [7]  (and 
some  details  and  complements  in  [2]).  The  construction  is  given  explicitly  via  the  nonlinear 
analog  of  the  Ito-Euler  approximation  scheme.  This  scheme  also  supplies  the  numeric  algorithm 
for  the  practical  calculations  of  the  solutions. 

The  links  with  non-equilibrium  statistical  mechanics,  the  replicator  dynamics  of  multi  agent 
evolutionary  games  and  epidemiology  were  established  in  [1]  (evolutionary  biology),  financial 
models  were  given  received  even  more  attention  highlighted  in  [5]  and  [10].  The  examples  again 
are  numerous,  as  these  evolutions  exhaust  all  positivity  preserving  evolutions  on  measures  sub¬ 
ject  to  certain  mild  regularity  assumptions.  In  particular,  they  include  the  Vlasov,  Boltzmann, 
Smoluchovski,  Landau-Fokker-Planck  equations,  as  well  as  McKean  diffusions  and  many  other 
models. 

Extending  the  link  with  usual  Markov  chains  (described  above)  to  general  Markov  processes 
with  continuous  state  space,  we  can  stress  that,  for  a  nonlinear  Markov  process,  the  future  de¬ 
pends  on  the  past  not  only  via  its  present  position,  but  also  via  its  present  distribution.  A 
nonlinear  Markov  semigroup  can  be  considered  as  a  nonlinear  deterministic  dynamic  system, 
though  on  a  weird  state  space  of  measures.  To  give  it  a  probabilistic  interpretation  one  should 
specify  a  stochastic  representation  for  this  semigroup  in  terms  of  nonlinear  transition  probabil¬ 
ities  satisfying  the  nonlinear  analog  of  the  Chapman-Kolmogorov  equation,  see  in  more  details 
in  Section  ’Findings  for  year  2’  below. 

2.2  Findings  for  year  2 

In  Year  1  the  theory  of  nonlinear  Markov  processes  was  developed  for  discrete  state  space  and 
initiated  for  the  general  case.  In  Year  2  we  completed  this  development. 

Let  us  describe  in  more  detail  the  central  object  of  our  study:  a  nonlinear  Markov  process. 
Loosely  speaking,  a  nonlinear  Markov  evolution  is  just  a  dynamical  system  generated  by  a 
measure-valued  ordinary  differential  equation  (ODE)  with  the  specific  feature  of  preserving 
positivity.  This  feature  distinguishes  it  from  a  general  Banach  space  valued  ODE  and  yields  a 
natural  link  with  probability  theory,  both  in  interpreting  results  and  in  the  tools  of  analysis. 
Technical  complications  for  the  sensitivity  analysis,  again  compared  with  the  standard  theory  of 
vector-valued  ODE,  lie  in  the  specific  unboundedness  of  generators  that  causes  the  derivatives  of 
the  solutions  to  nonlinear  equations  (with  respect  to  parameters  or  initial  conditions)  to  live  in 
other  spaces,  than  the  evolution  itself.  From  the  probabilistic  point  of  view,  the  first  derivative 
with  respect  to  initial  data  (specified  by  the  linearized  evolution  around  a  path  of  nonlinear 
dynamics)  describes  the  interacting  particle  approximation  to  this  nonlinear  dynamics  (which, 
in  turn,  serves  as  the  dynamic  law  of  large  numbers  to  this  approximating  Markov  system 
of  interacting  particles),  and  the  second  derivative  describes  the  limit  of  fluctuations  of  the 


4 


evolution  of  particle  systems  around  its  law  of  large  numbers  (probabilistically  the  dynamic 
central  limit  theorem). 

More  precise  definition  is  as  follows.  Let  M.(X)  be  a  dense  subset  of  the  space  A4( X)  of 
finite  (positive  Borcl)  measures  on  a  polish  (complete  separable  metric)  space  X  (considered 
in  its  weak  topology).  By  a  nonlinear  sub-Markov  (resp.  Markov )  propagator  in  A4 (X)  we 
shall  mean  any  propagator  Vt,r  of  possibly  nonlinear  transformations  of  A4(X)  that  do  not 
increase  (resp.  preserve)  the  norm.  If  Vt,r  depend  only  on  the  difference  t  —  r  and  hence 
specify  a  semigroup,  this  semigroup  is  called  nonlinear  (or  generalized)  sub-Markov  or  Markov 
respectively. 

The  usual,  linear,  Markov  propagators  or  semigroups  correspond  to  the  case  when  all  the 
transformations  are  linear  contractions  in  the  whole  space  A4(X).  In  probability  theory  these 
propagators  describe  the  evolution  of  averages  of  Markov  processes,  i.e.  processes  whose  evolu¬ 
tion  after  any  given  time  t  depends  on  the  past  X<t  only  via  the  present  position  Xt.  Loosely 
speaking,  to  any  nonlinear  Markov  propagator  there  corresponds  a  process  whose  behavior  after 
any  time  t  depends  on  the  past  X<t  via  the  position  Xt  of  the  process  and  its  distribution  at  t. 

More  precisely,  consider  the  nonlinear  equation  in  the  weak  form 

=  (A[pt]g,pt),  geC(X),  (3) 

with  a  certain  family  of  operators  A[p\  in  C(X)  depending  on  p  as  a  parameter  and  such  that 
each  A[p]  specifies  a  uniquely  defined  Markov  process  (say,  via  solution  to  the  corresponding 
martingale  problem,  or  by  generating  a  Feller  semigroup). 

Suppose  that  the  Cauchy  problem  for  equation  (3)  is  well  posed  and  specifies  the  weakly 
continuous  Markov  semigroup  Tt  in  Xi( X).  Suppose  also  that  for  any  weakly  continuous  curve 
pt  £  'P(X)  (the  set  of  probability  measures  on  A")  the  solutions  to  the  Cauchy  problem  of  the 
equation 

=  {A{nt]g,vt)  (4) 

define  a  weakly  continuous  propagator  Vt,r[p .],  r  <  t,  of  linear  transformations  in  A4(X)  and 
hence  a  Markov  process  in  X,  with  transition  probabilities  p\^t\x,dy).  Then  to  any  p  e  "P(X) 
there  corresponds  a  (usual  linear,  but  time  non-homogeneous)  Markov  process  X”  in  X  {y 
stands  for  an  initial  distribution)  such  that  its  distributions  vt  solve  equation  (4)  with  the 
initial  condition  u.  We  call  the  family  of  processes  Xf  a  nonlinear  Markov  process.  When 
each  A[p]  generates  a  Feller  semigroup  and  Tt  acts  on  the  whole  A4(X)  (and  not  only  on  its 
dense  subspace),  the  corresponding  process  can  be  also  called  nonlinear  Feller.  Allowing  for 
the  evolution  on  subsets  M(X)  is  however  crucial,  as  it  often  occurs  in  applications,  say  for 
the  Smoluchovski  or  Boltzmann  equation  with  unbounded  rates. 

Thus  a  nonlinear  Markov  process  is  a  semigroup  of  the  transformations  of  distributions 
such  that  to  each  trajectory  is  attached  a  “tangent”  Markov  process  with  the  same  marginal 
distributions.  The  structure  of  these  tangent  processes  is  not  intrinsic  to  the  semigroup,  but 
can  be  specified  by  choosing  a  stochastic  representation  for  the  generator,  that  is  of  the  r.h.s. 
of  (4). 

The  theoretical  issues  that  we  mentioned  above  concerned  the  well-posedness  of  equations 
of  type  (3)  and  its  sensitivity  to  various  parameters  and  were  developed  in  full  in  [3],  [4],  [7]. 
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The  development  was  carried  out  on  the  level  of  generality  needed  for  applications  to  many 
agent  and/or  control  systems  dealt  in  Year  3. 

2.3  Findings  for  year  3:  main  objectives 

The  last  three  indicative  directions  for  further  possible  directions  (configuration  space  of  non¬ 
trivial  geometry,  controlled  nonlinear  quantum  dynamic  semigroups  and  full  infinite-dimensional 
measure  valued  control  Markov  processes  and  games)  were  touched  upon  in  book  [1],  Sec.  11.3, 

11.4  and  in  book  [2],  Chapter  6,  but  were  mainly  left  to  the  future  research. 

Our  main  work  was  around  two  mainstreams  of  competitive  control  problems  for  nonlinear 
Markov  processes: 

1)  Each  agent  has  individual  payoff.  This  leads  to  mean-field  games  initiated  for  the  case 
of  underlying  diffusion  process  by  P.  Caines,  R.  Malhame,  M.  Huang.  Here  we  developed  the 
theory  for  an  arbitrary  underlying  nonlinear  Markov  process,  see  [6]  and 

2)  Individuals  fulfil  the  objectives  of  competitive  leaders  (generals  with  armies,  engineers 
with  robot-swamps,  etc),  where  the  main  completed  results  so  far  are  presented  in  [8].  This 
paper  summarized  many  ideas  of  this  project  and,  at  the  same  time,  opened  the  road  for  several 
further  directions  of  research  that  were  not  thought  about  at  the  initial  stage  of  the  project. 

Let  us  outline  the  theory  for  both  cases  in  more  detail. 

2.4  Findings  for  year  3:  mean-field  games 

Mean-field  game  methodology  aims  at  describing  control  processes  with  large  number  N  of 
participants  by  studying  the  limit  N  — >  oo  when  the  contribution  of  each  member  becomes 
negligible  and  their  interaction  is  performed  via  certain  mean-field  characteristics,  which  can 
be  expressed  in  terms  of  empirical  measures.  A  characteristic  feature  of  the  MFG  analysis  is 
the  study  of  a  coupled  system  of  a  backward  equation  on  functions  (Hamilton-Jacobi-Bellman 
equation)  and  a  forward  equation  on  probability  laws  (Kolmogorov  equation).  We  showed 
that  the  machinery  of  nonlinear  Markov  processes  could  serve  as  a  natural  tool  for  studying 
mean-field  games  with  the  general  underlying  Markov  dynamics  of  agents  (not  only  diffusions). 
More  specifically,  the  main  consistency  equation  of  MFG  can  be  looked  at  as  a  coupling  of  a 
nonlinear  Markov  process  with  certain  controlled  dynamics.  Using  this  link  we  develop  the 
MFG  methodology  for  a  wide  class  of  underlying  Markov  dynamics  including  in  particular 
stable  and  stable-like  processes,  as  well  as  their  various  modifications  like  tempered  stable-like 
process  or  their  mixtures  with  diffusions. 

Moreover,  our  abstract  approach  yields  essential  improvements  even  for  underlying  processes 
being  diffusions.  In  particular,  it  includes  the  case  of  diffusions  coefficients  (not  only  drifts) 
depending  on  empirical  measures,  it  allows  us  to  get  rid  of  the  assumption  of  small  coupling 
(or  composite  gain),  to  prove  the  crucial  sensitivity  estimates  (to  derive  the  regularity  of  HJB 
equations  from  the  regularity  of  the  Hamiltonian  functions),  and  finally  to  get  a  full  prove  of 
convergence  rate  of  order  1/N. 

Let  us  explain  now  the  main  ideas,  objectives  and  strategy  of  our  analysis.  Suppose  a 
position  of  an  agent  is  described  by  a  point  in  a  locally  compact  separable  metric  space  X.  A 
position  of  N  agents  is  then  given  by  a  point  in  the  power  XN  —  X  x  •  •  •  x  X  (N  times).  Hence 
the  natural  state  space  for  describing  the  variable  (but  not  vanishing)  number  of  players  is  the 


6 


union  X  =  \JfL\X3 .  We  denote  by  Csym(XN)  the  Banach  spaces  of  symmetric  (with  respect 
to  permutation  of  all  arguments)  bounded  continuous  functions  on  XN  and  by  Csym(X)  the 
corresponding  space  of  functions  on  the  full  space  X .  We  denote  the  elements  of  X  by  bold 
letters,  say  x,  y. 

Reducing  the  set  of  observables  to  Cs ym(A)  means  effectively  that  our  state  space  is  not 
X  (or  XN  in  case  of  a  fixed  number  of  particles)  but  rather  the  quotient  space  SX  (or  SXN 
resp.)  obtained  with  respect  to  the  action  of  the  group  of  permutations,  which  allows  the 
identifications  Csym(X)  =  C(SX )  and  CsyiD(XN)  =  C(SXN).  Clearly  SX  can  be  identified 
with  the  set  of  all  finite  collections  of  points  from  X,  the  order  being  irrelevant. 

A  key  role  in  the  theory  of  measure-valued  limits  of  interacting  particle  systems  is  played 
by  the  inclusion  SX  to  V(X)  (the  set  of  probability  laws  on  X)  given  by 

X  =  (xi, ...,  xN)  -^( SX1  +  •  •  •  +  5Xn)  =  ^Sx,  (5) 

which  defines  a  Injection  between  SXN  and  the  subset  V(f  (X)  (of  normalized  sums  of  Dirac’s 
masses)  of  V(X).  This  Injection  extends  to  the  Injection  of  SX  to 

VS(X)  :=  \J$=1V»(X)  C  V(X), 

that  can  be  used  to  equip  SX  with  the  structure  of  a  metric  space  by  pulling  back  any  distance 
on  V(X)  that  is  compatible  with  its  weak  topology. 

Let  {A[t,n,u\}  be  a  family  of  generators  of  Feller  processes  in  X ,  where  t  >  0,  n  G  V(X) 
and  u  €  U  (a  metric  space  interpreted  as  a  set  of  admissible  controls).  Assume  also  that  a 
mapping  7  :  R+  x  X  — y  U  is  given.  For  any  N,  let  us  define  the  following  (time-dependent) 
family  of  operators  (pre-generators)  on  Csym(XN)  describing  N  mean-field  interacting  agents: 

N 

A-t  W/(x)  =  A?[i)f{x  1,  ■■■  ,xN)  :=^2  Al[t,  /b  Uilfix!,  ■  ■  ■  ,  xN),  (6) 

i=  1 

where 

1  N  1 

V  =  N  ^Sxi  =  N5x 

i= 1 

is  the  empirical  distribution  of  agents,  ul  =  j(t,Xi )  and  Al[t,  n,Ui\f  means  the  action  of  the 
operator  A[t,/i,Ui]  on  the  ith  variable  of  the  function  /.  Let  us  assume  that  the  family 
generates  a  Markov  process  XN  =  { XN(t )  =  . . . ,  X^(t)  :  t  >  0)}  on  XN  for  any  N. 

We  shall  refer  to  it  as  a  controlled  (via  control  'y)  process  of  N  mean- field  interacting  agents. 

In  the  terminology  of  statistical  mechanics  the  operator  At[ 7]  (considered  for  all  N,  i.e.  lifted 
naturally  to  the  whole  space  Csyra(X))  should  be  called  the  second  quantization  of  A[t,n,u]. 

Using  mapping  (5),  we  can  transfer  our  process  of  N  mean-held  interacting  agents  from 
SXN  to  V(f(X).  This  leads  to  the  following  operator  on  CifPff  (X)): 

N 

A?h}F(s*/N)  =  A-t  W/(x)  =  V’  ui]f(x  1>  •  •  •  >  xn),  (7) 

i= 1 
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where  /(x)  =  F(8X/N)  and  x  =  (aq,  •  •  •  ,  xjy).  Let  us  calculate  the  action  of  this  operator  on 
linear  functionals  F,  that  is  on  the  functionals  of  the  form 


F9(g)  =  (g,v)  =  /  g{x)g{dx) 


(8) 


for  a  g  G  C(X).  Denoting  g®(x)  =  JA=1  g(%i)  for  x  =  (aq,  •  •  •  xn )  we  get 

A?[y}F»(6x/N)  =  i  (dfMs®)  (*i,  ■■.,!») 


1 

N 


N 


F,  (Xyt,  6X/N,  7 ((,  x,)]g)  (xt)  =  (A[t,  Sx/N,  7 (t,  .)]j,  Sx/N) . 


(9) 


i=  1 


Hence,  if  g^  =  8X/N  — >  gt  e  'P(A’)  as  N  — >■  oo,  we  have 

A-tb ]F9(8X/N)  ( A[t ,  /if,  y(t,  .)]#,  hf )  ,  as  N  ^  oo, 

so  that  the  evolution  equation 


(10) 


Ft  =  A?b]Ft 

of  our  controlled  process  of  N  mean-held  interacting  agents,  for  the  linear  functionals  of  the 
form  Ff(g)  =  (g,gt(g))  turns  to  the  equation 


(fiS  Ah)  (H[t, /i^,  , . )]^f, /i^) ,  /ig  g. 


in) 


We  call  this  equation  the  general  kinetic  equation  in  weak  form.  It  should  hold  for  g  from 
a  suitable  class  of  test  functions.  This  limiting  procedure  will  be  discussed  in  detail  later  on. 

Let  us  explain  how  the  mapping  7  pops  in  from  individual  controls.  Assume  that  the 
objective  of  each  agent  is  to  maximize  (over  a  suitable  class  of  controls  {«.})  the  payoff 


E 


J(s,  X f  (a),  gNs  ,  us)  ds  +  (Xf  (T)) 


LJt 


consisting  of  running  and  final  components,  where  the  functions  J  :  R+  x  X  x  V(X)  xU->R 
and  VT  :  X  — >  R,  and  the  final  time  T  are  given,  and  where  {g.}  is  the  family  of  the  empirical 
measures  of  the  whole  process 


/A  —  d - k  t  <  s  <T. 

By  dynamic  programming  (and  assuming  appropriate  regularity),  if  the  dynamics  of  empirical 
measures  gs  is  given,  the  optimal  payoff 


V^{t,  x)  =  sup  E 


J(s,X(s),g^,us)ds  +  VT(X(T )) 


of  an  agent  starting  at  x  at  time  t  should  satisfy  the  HJB  equation 
d  VN(t,x) 


dt 


+  max  ( ,J(t,  x,  g^ ,  u )  +  A[t,  gf ,  w]  Vjv(t,  x))  =  0 


(12) 


with  the  terminal  condition  Vj v(T, .)  =  VT(-).  If  pf  — >  pt  £  V(X)  as  N  — *  oo,  then  it  is 
reasonable  to  expect  that  the  solution  of  (12)  converges  to  the  solution  of  the  equation 


dV  (t,  x ) 
dt 


+  max  [J(t,  x,  nt ,  u )  +  A[t,  pt ,  u]V(i,  a;))  =  0. 


(13) 


Assume  HJB  equation  (13)  is  well  posed  and  the  max  is  achieved  at  one  point  only.  Let  us 
denote  this  point  of  maximum  by  u  —  r (t,x,  {p>t})- 

Thus,  if  each  agent  chooses  the  control  via  HJB  (13),  given  an  empirical  measure  p,  i.e. 
with 

'y(t,x)  =  r(t,x,{p,>t})i  (14) 

this  7  specihes  a  nonlinear  Markov  evolution  {pt}t> o  via  kinetic  equation  (11).  The  correspond¬ 
ing  MFG  consistency  (or  fixed  point)  condition  {p,}  =  {p,}  leads  to  the  equation 


(15) 


which  expresses  the  coupling  of  the  nonlinear  Markov  process  specified  by  (15)  and  the  optimal 
control  problem  specified  by  HJB  (13).  It  is  now  reasonable  to  expect  that  if  the  number  of 
agents  N  tends  to  infinity  in  such  a  way  that  the  limiting  evolution  is  well  defined  and  satisfies 
the  limiting  equation  (15)  with  T  chosen  via  the  solution  of  the  above  HJB  equation,  then  the 
control  7  and  the  corresponding  payoffs  represent  the  e-Nash  equilibrium  for  the  controlled 
system  of  N  agents,  with  e  — »  0,  as  N  — »  oo.  This  statement  (or  conjecture)  represents  the 
essence  of  the  MFG  methodology. 

Linder  certain  assumptions  on  the  family  A[t,  p,  u] ,  we  justify  this  claim  by  carrying  out  the 
following  tasks: 

Tl)  Proving  the  existence  of  solutions  to  the  Cauchy  problem  for  coupled  kinetic  equations 
(15)  within  an  appropriate  class  of  feedback  T  and  the  wcll-posedness  for  the  uncoupled  equa¬ 
tions  (11).  Notice  that  we  are  not  claiming  uniqueness  for  (15).  It  is  difficult  to  expect  this, 
as  in  general  Nash  equilibria  are  not  unique.  At  the  same  time,  it  seems  to  be  an  important 
open  problem  to  better  understand  this  non-uniqueness  by  describing  and  characterizing  spe¬ 
cific  classes  of  solutions.  On  the  other  hand,  wcll-posedness  for  the  uncoupled  equations  (11) 
is  crucial  for  further  analysis. 

T2)  Proving  the  well-posedness  of  the  Cauchy  problem  for  the  (backward)  HJB  equation 
(13),  for  an  arbitrary  flow  {p.}  in  some  class  of  regularity,  yielding  the  feedback  function  T  in 
the  class  required  by  Tl).  This  should  include  some  sensitivity  analysis  of  T  with  respect  to  the 
functional  parameter  {/i.},  which  will  be  needed  to  show  that  approximating  the  limiting  MFG 
distribution  {p.}  by  approximate  Al-particle  empirical  measures  yields  also  an  approximate 
optimal  control.  To  perform  this  task,  we  shall  assume  here  additionally  that  the  operators 
A[t,p,u\  in  (13)  can  be  decomposed  into  the  sum  of  a  controlled  1st  order  term  and  a  term 
that  does  not  depend  on  control  and  generates  a  propagator  with  certain  smoothing  properties. 
This  simplifying  assumption  allows  to  work  out  the  theory  with  classical  (or  at  least  mild)  solu¬ 
tions  of  HJB  equations.  Without  this  assumption,  one  would  have  to  face  additional  technical 
complications  related  to  viscosity  solutions. 

T3)  Showing  the  convergence  of  the  ./V-particle  approximations,  given  by  generators  (9)  to 
the  limiting  evolution  (11),  i.e.  the  dynamic  laws  of  large  numbers  (LLN),  for  a  class  of  controls  7 
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arising  from  (14)  with  a  fixed  {//.},  where  T  is  from  the  class  required  for  the  validity  of  Tl)  and 
T2).  Here  one  can  use  either  more  probabilistic  compactness  and  tightness  (on  Skorokhod  paths 
spaces)  approach,  or  a  more  analytic  method  via  semigroups  of  linear  operators  on  continuous 
functionals  of  measures.  We  use  the  second  method,  as  it  yields  more  precise  convergence  rates. 
For  the  analysis  of  the  convergence  of  the  corresponding  semigroups  the  crucial  ingredient  is 
the  analysis  of  smoothness  ( sensitivity )  of  the  solutions  to  kinetic  equations  (11)  with  respect 
to  initial  data.  The  rates  of  convergence  in  LLN  imply  directly  the  corresponding  rather  precise 
estimates  for  the  so-called  propagation  of  chaos  property  of  interacting  particles. 

T4)  Finally,  combining  T2)  and  T3),  one  has  to  show  that  thus  obtained  strategic  profile 
(14)  with  {/L}  =  {//.}  represents  an  e-equilibrium  for  N  agents  system  with  e  — >  0,  as  IV  — *  oo. 
Actually  we  going  to  prove  this  with  e  =  l/N  using  the  method  of  tagged  particles  in  our  control 
setting. 

This  program  is  carried  out  under  rather  general  assumptions  in  the  extensive  preprint  [6]. 

Let  us  specify  our  model  a  bit  further. 

Of  particular  interest  are  the  models  with  the  one-particle  space  X  having  a  spatial  and  a 
discrete  components,  the  latter  interpreted  as  a  type  of  an  agent.  Thus  let  X  =  R'z  x  /C,  where 
/C  is  either  a  finite  or  denumerable  set.  In  this  case,  functions  from  C(X)  can  be  represented 
by  sequences  /  =  with  each  ft  G  (7(Rd),  the  probability  laws  on  X  are  similarly  given 

by  the  sequences  fi  =  (aO^a:  °f  positive  measures  on  Rd  with  the  masses  totting  up  to  one. 

The  operators  A  in  C(X)  are  specified  by  operator-valued  matrices  {A^},  i,j  G  /C,  with 
Aij  being  an  operator  in  C(Rd),  so  that  (Af)i  =  'YhjeK.Aijfj-  If  is  not  difficult  to  show  that  for 
such  a  matrix  A  to  define  a  conditionally  positive  conservative  operator  in  C(X)  (in  particular, 
a  generator  of  a  Feller  process)  it  is  necessary  that  AtJ  for  i  ^  j  are  integral  operators 

(Aijf)(z)  =  I  ( fj(y )  -  f{z))vij{z^  dy) 

J  Rd 

with  a  bounded  (for  each  z)  measure  i/ij(z,dy),  and  the  diagonal  terms  are  given  by  the  Levy- 
Khintchin  type  operators  (i  G  1C): 

A,  f(z)  =  f(Gi(z)V,  V)/(z)  +  (bt(z),  V/(z)) 


+  /  (f(z  +  y)-f(z)-{Vf(z),y)lBl(y))vi(z,dy), 


'  Rd 


(16) 


with  Gi(z)  being  a  symmetric  non-negative  matrix,  zy(z, .)  being  a  Levy  measure  on  Rc/,  i.e. 


min(l,  li/lVi (z,dy)  <  oo, 


X{0})  =  0, 


(17) 


depending  measurably  on  z,  and  where  I#!  denotes,  as  usual,  the  indicator  function  of  the  unit 
ball  in  Rd. 

Operators  A^  with  i  ^  j  describe  the  mutation  (migration)  between  the  types.  If  mutations 
are  not  allowed,  A  will  be  given  by  a  diagonal  matrix  with  the  diagonal  terms  A;  =  An  of  type 
(16)- 

Let  us  assume  additionally  that  each  agent  can  control  only  its  drift,  that  is  the  diagonal 
generators  have  the  form 


Ai[t,fi,u]f(z)  =  (hi(t,z,n,u),\/f(z))  +  Li[t,p]f(z), 


i  =  1, 


K, 


(18) 
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with  Li  of  form  (16),  i.e. 


Li[t,n\f{z )  =  z,  n)'V.'V)f(z)  +  ( bi{t,z,n),Vf{z )) 

+  [  (f(z  +  y)  -  f(z)  -  (Vf(z),y)lBl(y))vi(t,z,fi,dy) 

Jnd 


(19) 


with  the  coefficients  Gt,  bi:  z/*  depending  on  t  e  R+  and  ft  =  (pi,  ■  ■  ■  ,  jiK)  e  V(X)  as  parameters. 

If,  for  a  given  (probability)  measure  flow  {pt}te[o,T\,  the  operators  L[t,  fit]  =  {L i,  •  •  •  ,  Lk)  [t,  fit] 
generate  a  Markov  process  {Rt[pt]}te[o,T]  =  {(Rj [hi],  •  •  •  ,  iff  [/•*<] )}te[o,T]j  one  can  write  a  stochas¬ 
tic  differential  equation  (SDE)  corresponding  to  the  generator  given  in  (18)  as 


dXlt  =  hi(t,  XI,  fit,  u\)  dtt  +  dR\[fit],  i  —  1,  •  •  •  ,K. 


If  fit  are  required  to  coincide  with  the  laws  of  X\,  for  all  t  G  [0,  T],  these  equations  take  the 
form  of  SDEs  driven  by  nonlinear  Levy  noises,  developed  in  [1],  [3],  [7]. 

The  initial  work  on  the  mean  held  games,  done  by  Lions  et  al.  and  Caines  et  ah,  dealt  with 
the  processes  Rt\fi\  being  Brownian  Motions  without  dependence  on  fi.  In  our  framework,  this 
underlying  process  is  extended  to  an  arbitrary  Markov  process  with  a  generator  (19)  depending 
on  fi. 

In  the  main  kinetic  equation  (11),  we  shall  then  have  ( g,fit )  =  Ah,t)  and 

A[t,  fit,  7 (t,  -)]g  =  {Ai[t,  fit,  7 (t,  .)]&}£=  i 


with 

Ai[t,  fi,  7 (t,  -)]gi(z)  =  ( hiit ,  z,  fi,  7 (t, .)),  Vgi(z))  +  Lt[t,  fi\gi(z).  (20) 

HJB  equation  (13)  now  decomposes  into  a  collection  of  HJB  equations  for  each  class  of 
agents,  written  as 

BVHt  r) 

^  ’  +  H\(x,XV\x),  fit)  +  Li[t,  fit]V\t,x)  =  0  (21) 

where 

Hlt(x,p,  fit)  ■.=  max{hi(t,x,  fit,u)p  +  Ji(t,x,  fit,u)}.  (22) 

uG  U 

We  have  assumed  the  resulting  feedback  control  is  unique  (i.e.  argrnax  in  (22)  is  unique). 
The  basic  example  of  such  situation  is  given  by  iToo-optimal  control  problems,  where  For  each 
i,  the  running  cost  function  J,  is  quadratic  in  u,  i.e. 

Ji(t,  x,  fi,  u )  =  oti(t,  x,  fi)  —  9i(t,  x,  fi)u2 

and  the  drift  coefficient  ht  is  linear  in  u,  i.e. 

hi(t,x,fi,u)  =  j3i(t,x,fi)u, 

where  the  functions  ot^fd^di  :  [0, T]  x  Rd  x  V(Rd)  R  and  9i{t,x,p)  >  0  for  any  ( t,x,fi ). 
Thus  the  explicit  formula  of  the  unique  point  of  maximum  becomes  available: 

U  =  2 
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and  the  HJB  equation  (21)  rewrites  as 
(t  x^)  3^ 

- dt  +  +  ai(^x^)  +  Li[t,Ht\vl(t,x)  =  0 

which  is  a  generalized  backward  Burger’s  equation.  Another  natural  example  is  the  situation, 
where,  for  each  i,  hi(t,x,  /i,u)  =  u  and  Ji(t,x,  n,u)  is  a  strictly  concave  smooth  function  of  u. 
Then  H't  is  the  Legendre  transform  of  —  J  as  a  function  of  u,  the  unique  point  of  maximum  in 
(22)  is  therefore  u  =  dH\/dp  and  the  kinetic  equation  (15)  takes  the  form 

d  \  OH * 

“^(fiS/h)  ^  Ht)9i  ~h  (*^)  Pi  k't)  \p=Wl(x)^  9ii  AL,t)  •  (23) 

2.5  Findings  for  year  3:  centralized  control 

Let  us  turn  to  the  discussion  of  centralized  controls.  We  consider  here  only  the  case  of  finite 
initial  state  space  (for  which  the  theory  is  fully  developed  so  far),  when  the  corresponding  space 
of  measures  becomes  a  finite-dimensional  Euclidean  space  (more  precisely  its  positive  orthant 
R+  ),  so  that  the  limiting  measure- valued  evolution  becomes  a  deterministic  control  process  or  a 
differential  game  in  R+.  Let  us  show  how  the  identification  of  deterministic  limit  is  carried  and 
formulate  the  main  results  on  convergence  referring  for  full  proofs  to  [8].  we  shall  assume  that 
there  is  a  fixed  number  of  players  {1,  •  •  •  ,  K]  each  controlling  a  stochastic  system  consisting  of 
a  large  number  Ni,  ■  ■  ■  ,  Nk  — >  oo  components  respectively.  These  can  be  generals  controlling 
armies,  engineers  controlling  robot  swamps,  large  banks  managers  controlling  subsidiaries,  etc. 
The  components  can  interact  between  themselves  and  with  agents  of  other  groups.  The  limit 
Ni,  •  ■  ■  ,Nk—>oo  will  be  described  by  a  differential  game  in  R+. 

Recall  the  standard  notation  Cfc(f^),  k  G  N,  for  the  Banach  space  of  k  times  continuously 
differentiable  functions  in  the  interior  of  Q  C  Rrf  with  /  and  all  its  derivatives  up  to  and 
including  order  k  having  continuous  and  bounded  extension  to  equipped  with  norm  ||/||cfc(o) 
which  is  the  sum  of  the  sup-norms  of  /  and  all  its  derivatives  up  to  and  including  order  k.  For 
a  G  (0, 1],  we  denote  by  Cfc,a(f2)  the  subspace  of  Cfc(f^)  consisting  of  functions,  whose  kth  order 
derivatives  are  Holder  continuous  of  index  a.  The  Banach  norm  on  this  space  is  defined  as  the 
sum  of  the  norm  in  CA:(f2)  plus  the  minimal  Holder  constant. 

Law  of  large  numbers  for  interacting  Markov  chains. 

Let  us  first  recall  the  basic  setting  of  mean-field  interacting  particle  systems  with  a  finite 
number  of  types.  Suppose  our  initial  state  space  is  a  finite  set  {1, d},  which  can  be  interpreted 
as  the  types  of  particles  (say,  possible  opinions  of  individuals  on  a  certain  subject,  or  the  levels  of 
fitness  in  a  military  unit,  or  the  types  of  robots  in  a  robot  swamp).  Let  {Q(t,  x)}  =  {(Qij)(t,  x)} 
be  a  family  of  d  x  d  square  Q-matrices  or  Kolmogorov  matrices  (i.e.  non-diagonal  elements 
of  these  matrices  are  non-negative  and  the  elements  of  each  row  sum  up  to  one)  depending 
continuously  on  a  vector  x  from  the  closed  simplex 

d 

^d  =  {x  =  (xi,...,xd)  6  R+  :  Xj  —  1}, 

3= 1 
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and  piecewise  continuously  on  time  t  >  0.  For  any  x ,  the  family  {Q(.,x)}  specifies  a  Markov 
chain  on  the  state  space  {1, d}  with  the  generator 

( Q(t,x)f)n  =  Y  Qnm(t,x)(fm  -  /„),  /  =  (/l,  ■  •  •  ,/d), 

m^n 

and  with  the  intensity  of  jumps  being 

\Qu{t,x)\  =  —Quit,  x)  =  YQij(t>x)- 

In  other  words,  the  transition  matrices  P(s,t,x )  =  (Pij(s,t,x))fj=1  of  this  chain  satisfies  the 
Kolmogorov  forward  equations 


d 

dt 


d 

Pij(s,t,x )  = 

z=i 


s  <  t. 


Suppose  we  have  a  large  number  of  particles  distributed  arbitrary  among  the  types  {1, 

More  precisely  our  state  space  S  is  Z+,  the  set  of  sequences  of  d  non-negative  integers  N  = 
(ni, ... ,nd ),  where  each  rq  specifies  the  number  of  particles  in  the  state  i.  Let  \N\  denote  the 
total  number  of  particles  in  state  N:  |IV|  =  ri\  +  ...  +  n<j.  For  i  ^  j  and  a  state  N  with  rq  >  0 
denote  by  Nli  the  state  obtained  from  N  by  removing  one  particle  of  type  i  and  adding  a 
particle  of  type  j,  that  is  rq  and  rij  are  changed  to  nt  —  1  and  nj  +  1  respectively.  The  mean- 
field  interacting  particle  system  specified  by  the  family  {Q}  is  defined  as  the  Markov  process 
on  S  specified  by  the  generator 


d 

L,S(N)  =  ntQtj(t,N/\N\)[f(m)  -  f(N)].  (24) 

i,j= 1 

Probabilistic  description  of  this  process  is  as  follows.  Starting  from  any  time  and  current  state 
N  one  attaches  to  each  particle  a  |  Q%%  \  {N/ 1  A^|  )-exponential  random  waiting  time  (where  i  is  the 
type  of  this  particle).  If  the  shortest  of  the  waiting  times  r  turns  out  to  be  attached  to  a  particle 
of  type  i,  this  particle  jumps  to  a  state  j  according  to  the  distribution  (Qij/\Qii\)(N/\N\). 
Briefly,  with  this  distribution  and  at  rate  \Qa\(N/\N\),  any  particle  of  type  i  can  turn  (migrate) 
to  a  type  j.  After  any  such  transition  the  process  starts  again  from  the  new  state  NlK  Notice 
that  since  the  number  of  particles  \N\  is  preserved  by  any  jump,  this  process  is  in  fact  a  Markov 
chain  with  a  finite  state  space. 


Remark  1  Yet  another  way  of  describing  the  chain  generated  by  Lt  is  via  the  forward  Kol¬ 
mogorov  (or  master)  equation  for  its  transition  probabilities  Pmn^s^): 


d  ^ '  j\ jji  d  '  jyij 

PMN(s,t )  =  Y(Ui  +  1)(5h(^  T^)PMNn(s,t)  ~  Y  niQij(t>  T^|)-PMiv(s,i),  S  <  t. 

i,j= 1  '  '  i,j= 1  '  ' 


dt 
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To  shorten  the  formulas,  we  shall  denote  the  inverse  number  of  particles  by  h,  that  is 
h  =  l/|iV|.  Normalizing  the  states  to  N/\N\  e  where  is  a  subset  of  with  coordinates 
proportional  to  h,  leads  to  the  generator  of  the  form 


d  d 

L,y(iV/|JV|)  =  J]^j^j!iV|ei,(t,JV/|iV|)[/(JV«/|JV|)-/(iV/|iV|)],  (25) 

i= 1  j= 1  '  ' 

or  equivalently 


Ltf(x)  =  XiQ 

i=l  j= 1 


(t,  a;)  —  [/(a:  -  he*  +  hej ) 


f(x)},  x  G  hZd+, 


(26) 


where  e\ , ...,  denotes  the  standard  basis  in  Rd.  With  some  abuse  of  notation,  let  us  denote  by 
hNt,h  the  corresponding  Markov  chain.  The  transition  operators  of  this  chain  will  be  denoted 
by 

*s,tf(hN)  =  E s,hNf(hN(t,  h)),  s<  t,  (27) 

where  ES!X  denotes  the  expectation  of  the  chain  started  at  x  at  time  s.  These  operators  are 
known  to  form  a  propagator,  i.e.  they  satisfy  the  chain  rule  (or  Chapman-Kolmogorov  equation) 

=  *s,t,  s  <  t  <  r. 


We  shall  be  interested  in  the  asymptotic  behavior  of  these  chains  as  h  — >  0.  To  this  end, 
let  us  observe  that,  for  /  e  CT(Ed), 


lim  |JV|[/(JV«/|JV|)  - /(JV/IJVI)] 

|iV|— »oo,  iV/|JV|— >a: 


df_ 

dxj 


(x) 


df_ 

dxi 


0*0, 


so  that 


where 


lim 

|iV|— >oo,  N/\N\—tx 


Lt  f(N/\N\)  =  \tf(x), 


d  n  p  d  O-f 

A  tf(x)  =  y'xiQ;i{t.x)'.)  '  -  Q^-](x)  =  EEW-1)  -  xkQki(t,x)\-^-(x).  (28) 


i= 1  j^i  ^  1  k= 1  i^k 

The  limiting  operator  A tf  is  a  first-order  PDO  with  characteristics  solving  the  equation 


d 

Xk  =  y ^[xjQik(t,x)  -xkQki(t,x)]  =  y ^XiQik(t,x),  k  =  l,...,d,  (29) 

i^=k  i= 1 

called  the  kinetic  equations  for  the  process  of  interaction  described  above.  The  characteristics 
specify  the  dynamics  of  the  deterministic  time-nonhomogeneous  Markov  Feller  process  in  E^ 
defined  via  the  generator  At.  The  corresponding  transition  operators  act  on  C(Ed)  as 


*s,tf{x)=f{XatX{t)),  s  <t,  (30) 

where  XStX(t)  is  the  solution  to  (29)  with  the  initial  condition  x  at  time  s.  These  operators 
form  a  Feller  propagator  (i.e.  <FS^  depend  strongly  continuous  on  s,  t  and  satisfy  the  chain  rule 
=  ®s,r,  s  <t  <  r).  Of  course  in  case  of  Q  that  do  not  depend  on  time  t  explicitly, 
depend  only  on  the  difference  t  —  s  and  the  operators  form  a  Feller  semigroup. 
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Remark  2  It  is  easy  to  see  that  if  Xk  ^  0,  then  (. XSjX(t))k  ^  0  for  any  t  >  s.  Hence  the 
boundary  of  E^  is  not  attainable  for  this  semigroup,  but,  depending  on  Q,  it  can  be  glueing  or 
not.  For  instance,  if  all  elements  of  Q  never  vanish,  then  the  points  XSjX(t)  never  belong  to  the 
boundary  of  Ed  for  t  >  s,  even  if  the  initial  point  x  does  so. 

Theorem  2.1  (i)  Let  all  the  elements  Qij(t,.)  belong  to  C1,a( E),  a  G  (0,1],  with  norms  uni¬ 
formly  bounded  in  t.  Then,  if  for  some  s  >  0  and  x  G  Rd,  the  initial  data  hNs  converge  to  x  in 
Rd,  as  h  — >  0,  the  Markov  chains  hN{t ,  h )  with  the  initial  data  hNs  ( generated  by  L £  and  with 
transitions  ^s,t)  converge  in  distribution  and  in  probability  to  the  deterministic  characteristic 
XSjX(t).  For  the  corresponding  converging  propagators  of  transition  operators  the  following  rates 
of  convergence  hold: 

sup  sup  [<,/(fcl\T)  -  <C(T)(i-s)fc“||/||c...(Ea),  (31) 

o <s<t<T  NeZ*:\N\=l/h 


for  f  G  C'1,a(E)  and 

sup  [E s,hNf(hN(t,  h))  -  f(X9,x{t))]  <  C(T)  ((t  -  s)ha\\f\\ci^d)  +  \\f\\c^d)\hN  -  x\)  , 

0  <s<t<T 

(32) 

where  C(T )  depends  only  on  the  supremum  in  t  of  C1,a(fE)-norm  of  the  functions  Q(t,x). 

(ii)  Assuming  a  weaker  regularity  condition,  namely  that  Qij(t, .)  belong  to  C1(E)  uniformly 
in  t,  the  convergence  of  Markov  chains  hN(t,  h )  in  distribution  and  in  probability  to  the  deter¬ 
ministic  characteristics  still  holds,  but  instead  of  (31),  we  have  weaker  rates  in  terms  of  the 
modulus  of  continuity  Wh  of  V/  and  Q: 

sup  sup  [*h9,tf{hN)  -  *9,tf{hN)] 

0 <s<t<T  N£Zf:\N\=l/h 


<  C(T)(t  -  S )  (whC(T)(y  f)  +  U>hC(T)(XQ)Wf\\c\T.d))  , 


where  C(T)  depends  on  the  Cl{Td)-norm  of  Q.  A  similar  modification  of  (32)  holds. 


(33) 


Our  objective  is  to  extend  this  result  to  interacting  and  competitively  controlled  families  of 
Markov  chains. 

Mean  field  Markov  control 

Turning  to  control  dynamics,  let  us  start  with  mean-field  controlled  Markov  chains  without 
competition.  Suppose  we  are  given  a  family  of  Q-matrices  {Q(t,u,x)}  =  {(Qij)(t,u,x),  i,j  = 
1,  •  •  -d},  depending  on  x  G  T,d,  t  >  0  and  a  parameter  u  from  a  metric  space  interpreted  as 
control.  The  main  assumption  will  be  that  Q  G  C1,Q!(Ed)  as  a  function  of  x  with  the  norm 
bounded  uniformly  in  t,  u,  and  Q  depends  continuously  on  t  and  u. 

Any  given  bounded  measurable  curve  u(t),  t  G  [0,  T] ,  defines  a  Markov  chain  on  with 
the  time-dependent  family  of  generators  of  type  (25),  that  is 


-ht,u(t)f 


(34) 
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or  equivalently 


d  d  ^ 

Lt,u(t)f(x )  =  ^2^2xiQij(t,u(t),x)-[f(x  -  hdi  +  he-j)  -  f(x)}.  (35) 

i=  1  J=1 

For  simplicity  (and  effectively  without  loss  of  generality),  we  shall  stick  further  to  controls  u(.) 
from  the  class  Cpc[ 0,  T]  of  piecewise-continuous  curves  (with  a  finite  number  of  discontinuities). 
Again  for  /  e  C1( Ed), 

lim  L!u(tJ(N/\N\)  =  Atu(t)f(x), 

h=l/|JV|— >0,  N/\N\—tx 

where 


with  the  corresponding  controlled  characteristics  governed  by  the  equations 

d 

xk  =  y ^[xjQik(t,u(t),x)  -  xkQki(t,u(t),x)}  =  y ^XiQik(t,u(t),x),  k  =  (37) 

i^k  i=  1 


For  a  given  T  >  0  and  continuous  functions  J  (current  payoff)  and  Vr  (terminal  payoff), 
let  r(T,  h)  denote  the  problem  of  a  centralized  controller  of  the  chain  with  \N\  =  1/h  particles, 
aiming  at  maximizing  the  payoff 


N(s,h)\ 

1^1  J 


ds  +  Vt 


( N(T,h)\ 

V  \N\  J' 


(38) 


The  optimal  payoff  will  be  denoted  by  Vh(t,x)\ 


Vh(t,x)  = 


sup 

u(.)€CpC[t,T] 


E 


u(.) 

t,x 


( J(s ,  u(s),  hN(s,  h))ds  +  Vr(hN(T,  h)) 


(39) 


where  denotes  the  expectation  with  respect  to  the  Markov  chain  on  generated  by  (34) 
and  started  at  x  —  hN  at  time  t. 

We  are  aiming  at  approximating  Vh(t,x)  by  the  optimal  payoff 


V(t,x) 


sup 

u(.)(E.Cpc[t,T] 


J{s,u(s),Xt>x(s))  ds  +  VT(X :t}X(T)) 


LJt 


(40) 


for  the  controlled  dynamics  (37). 

We  can  also  obtain  approximate  optimal  synthesis  for  problems  T(T,  h)  with  large  \N\  =  1/h, 
at  least  if  regular  enough  synthesis  is  available  for  the  limiting  system.  Let  us  recall  that  a 
function  y(f,  x)  is  called  an  optimal  synthesis  (or  an  adaptive  policy)  for  the  problem  T(T,  h)  if 


Vh(t,x)  =  El1 


(. J(s ,  7(s,  hN(s,  h)),  hN(s))ds  +  VT(hN(T,  h)) 


(41) 
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for  all  t  <  T  and  x  G  E^,  where  Efx  denotes  the  expectation  with  respect  to  the  Markov  chain 
on  E^  generated  by  (34)  with  u(t)  =  q(f,  x)  and  starting  at  x  —  hN  at  time  t.  A  function 
7 (t,x)  is  called  an  e-optimal  synthesis  or  an  e-adaptive  policy,  if  the  r.h.s.  of  (41)  differs  from 
its  l.h.s.  by  not  more  than  e.  Similarly  an  optimal  synthesis  or  an  adaptive  policy  are  defined 
for  the  limiting  deterministic  system. 


Theorem  2.2  (i)  Assume  that  Q,J  depend  continuously  on  t,u  and  Q,J  G  C1,a(Ed),  a 
(0, 1]  ,  as  functions  of  x,  with  the  norms  bounded  uniformly  in  t,u,  and  finally  Vp  G  C1,Q(Ec 
Then 


sup  [ Vh(t,hN )  —  V(t,x)] 

0  <t<T 


<C(T)((T-t)ha  +  \hN-x\)  ^r||ci.«(Ed)  +  sup||J(f,K,.)||ci.Q(Sd)^  ,  (42) 

with  C{T )  depending  only  on  the  bounds  of  the  norms  of  Q  in  C1,a(fEd).  Moreover,  if  u{t) 
is  an  e-optimal  control  for  deterministic  dynamics  (37)  ,  that  is  the  payoff  obtained  by  using 
u(.)  differs  by  e  from  V(t,x),  then  u(.)  is  also  an  (e  +  C(T)ha) -optimal  control  for  |iV|  =  1/h 
particle  system. 

(ii)  Suppose  additionally  that  u  belong  to  a  convex  subset  of  a  Euclidean  space  and  that 
Q(t,u,x )  depends  Lipschitz  continuously  on  u.  Let  e  >  0,  and  let  y(t, x)  be  a  Lipschitz  con¬ 
tinuous  function  of  x  uniformly  in  t  that  represents  an  e-optimal  synthesis  for  the  limiting 
deterministic  control  problem.  Then,  for  any  5  >  0,  there  exists  h0  such  that,  for  h  <  h0, 
7 (t,x)  is  an  (e  +  5)-optimal  synthesis  for  the  approximate  optimal  problem  T(T,  h )  on  E|). 


Notice  finally  that  by  the  standard  dynamic  programming,  the  optimal  payoff  V ( t ,  x)  given 
by  (40)  represents  the  unique  viscosity  solution  of  the  HJB-Isaacs  equation 


dV .  , 

— -  (f,  x)  +  max 
Ot  u 


J(t,  u,  x) 


^  ~'J  % iQikiTi 

i,k= 1 


,dV ,  . 

x)wAx> 


=  0, 


(43) 


and  the  optimal  payoff  Vh(t,x)  given  by  (39)  solves  the  HJB  equation 

dVh 

-—-(t,x)+max[J(t,u,x)  +  L'luVh(t,x)]  =  0.  (44) 

Ot  u 

Thus,  as  a  corollary  of  Theorem  2.2,  we  have  proved  the  convergence  of  the  solutions  of  the 
Cauchy  problem  for  equation  (44)  to  the  viscosity  solution  of  (43). 

Two  players  with  mean-field  or  binary  interaction 

Let  us  turn  to  a  game-theoretic  setting  starting  with  a  simplest  model  of  two  compet¬ 
ing  mean-field  interacting  Markov  chains.  Suppose  we  are  given  two  families  of  Q-matrices 
{Q(t,  u,x)  =  ( Qij)[u,x )}  and  { P(t,v,x )  =  (Pij)(v,x)},  i,j  =  1,  •  •  •  d,  depending  on  x  G  Ed 
and  parameters  u  and  v  from  two  subsets  U  and  V  of  Euclidean  spaces.  Any  given  bounded 
measurable  curves  u(t),v(t),  t  G  [0,  T],  define  a  Markov  chain  on  E x  specified  by 

the  generator 


N  M  N  ( Nij  M  \ 


N  M 


\N\’\M\ 
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M  w  .  /  N  Mij 


N  M 


where  N  =  (m,  ■■■  ,nd),  M  =  (m  i,  •  •  •  ,  md). 

We  shall  assume  for  simplicity  that  \N\  =  \M\  =  1/h. 

Then  (45)  rewrites  as 

d  d  ^ 

Llu(t),V{t)f(.x,y)  =  EE  XiQij(t,u{t),x)-[f{x  -  hei  +  hej,  y)  -  f(x,y )] 

*= i  j= i 

d  d  ^ 

+EE  ViPij(t,v(t),y)-[f(x,y^  hei  +  hej )  -  f(x,y )],  x,y  G  /iZj. 

i=i  j=i 

For  f  e  C'iZ.i  x  Ej). 


„  „  „„  Jim  LU{t),„mf(N/\N\>M/\M\)  =  A,, »(«),„(, )/(i,y), 


where 


A t,u{t),v{t)f(x,y)  =  YY[xlQlk(tMt),x)  -  xkQki(t,u(t),x)\— —(x) 

k=  1 

+  5^^[s/i-Pifc(fX*),aO  -J/Jfc-Pfc*(*,v(i),j/)]^-(j/).  (47) 

fc=i  y 

The  corresponding  controlled  characteristics  are  governed  by  the  equations 

d 

Xk  =  y^\xiQik(t,u(t),x)  -  xkQki(t,u(t),x)]  =  y ^XiQik(t,u(t),x),  k  =  1,  ...,d,  (48) 


J/fc  =  ^2[yiPik(t,v(t),y)  -  ykPki{t,v(t),y)}  =  ^2yiPik(t,v(t),y),  k  =  1  (49) 

i^k  i= 1 

For  a  given  T  >  0,  let  us  denote  by  T(T,  h )  the  stochastic  game  with  the  dynamics  specified 
by  the  generator  (45)  and  with  the  objective  of  the  player  /  (controlling  Q  via  u)  to  maximize 
the  payoff 


J  s,u(s),v(s), 


N(s,  h)  M(s,h ) 


ds  +  Vt 


N(T,h )  M(T,h ) 


for  given  functions  J  (current  payoff)  and  Vr  (terminal  payoff),  and  with  the  objective  of  player 
II  (controlling  P  via  v)  to  minimize  this  payoff  (zero-sum  game).  As  previously  we  want  to 
approximate  it  by  the  deterministic  zero-sum  differential  game  T(T),  defined  by  dynamics  (48), 
(49)  and  the  payoff  of  player  /  given  by 


J(s,u(s),v(s),XttX(s),Ytty(s))ds  +  VT(Xt,x(T),Ytty(T)). 
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Recall  the  basic  notions  of  the  upper  and  lower  values  for  a  game  T(T).  As  above,  we 
shall  use  controls  w(.)  and  v(.)  from  the  classes  Cpc([0,T\-,U)  and  Cpc([0,  T];  V)  of  piecewise- 
continuous  curves  with  values  in  U  and  V  respectively.  A  progressive  strategy  of  player  /  is 
dehned  as  a  mapping  f3  from  Cpc{\ 0,  T];  V)  to  Cpc([ 0,  T];  U )  such  that  if  Ui(.)  and  v2 ( • )  coincide 
on  some  initial  interval  [0, t],  t  <  T,  then  so  do  U\  =  /3(ui(.))  and  u2  =  /3(u2(-))-  Similarly 
progressive  strategies  are  dehned  for  player  II.  Let  us  denote  the  sets  of  progressive  strategies 
for  players  /  and  II  by  S'pQO,  T];  U)  and  S'pQO,  T];  V ).  Then  the  upper  and  the  lower  values  for 
the  game  T(T)  are  dehned  as 


V+{t,x,y) 


sup  inf 

/3esp([o,T]-,u)  v(.)&cpc{[o.T\,v) 


I  J(s,  {/3(v))(s),v(s),Xt,x(s),  Yt,x(s))  ds  +  VT{Xt>x{T),  Yt>x(T )) 
IJt 

V_(t,x,y)  =  inf  sup 

PeSp([0,T]-,V)  u(.)eCpc([0,T];U) 

>T 

J(S,u(S),(/3(u))(S),XM(S),yM(S))  ds  +  Vr(XtiX(T),YtiX(T)) 


If  the  so  called  Isaac’s  condition  holds,  that  is,  for  any  Pk,Qk, 


(52) 


max  mm 

U  V 


d  d 

J{t ,  U,  V,  xiQik(t,  V,  x)qk  +  yipik(t>  V>  X)Pk 

i,k=  1 


i,k= 1 


=  min  max 

V  u 


J(t,  u,  v,  x,  y) 


d 

£ 

i,k=  1 


XiQik(t,v,x)qk 


cL 

£ 

i,k= 1 


yiPik(t,v,x)pk 


(53) 


then  the  upper  and  lower  values  coincide:  V+(t,x,y)  =  V-(t,x,y). 

Similarly  the  upper  and  the  lower  values  V+(t,x,y)  and  V^(t,x,y)  for  the  stochastic  game 
T(T,  h )  are  dehned. 


Theorem  2.3  Assume  that  Q,P,J  depend  continuously  on  t,u  and  Q,P,J,Vt  G  C1,a(Yd), 
a  G  (0, 1],  as  functions  of  x,  with  the  norms  bounded  uniformly  in  t,u,v.  Then 

sup  [ V±(t ,  h.N)  —  V±(t,  x)] 

0  <t<T 


<  C(T)({T  -  t)ha  +  \hN  -  x\)  (\\VT\\ci,apd)  +  sup  ||  J(t,u,  v,  .)||c^(Ed))  ,  (54) 

with  C(T )  depending  only  on  the  bounds  of  the  norms  of  Q  in  C1,a(Yd).  Moreover,  if  /3  G 
S'pQO,  T];  U)  and  v(.)  G  Cpc([0,T\-,V)  are  e- optimal  for  the  minimax  problem  (52),  then  this 
pair  is  also  (e  +  C (T)ha) -optimal  for  the  corresponding  stochastic  game  T(T,  h ). 

As  in  Theorem  2.2  (ii),  one  can  also  approximate  optimal  (equilibrium)  adaptive  polices  for 
T(T,  h ),  if  regular  enough  (i.e.  Lipschitz  continuous)  equilibrium  adaptive  policies  exist  for  the 
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limiting  game  T(T).  In  fact,  as  is  known  from  differential  games,  the  upper  value  V+(t,x,y) 
represents  the  unique  viscosity  solution  of  the  upper  Isaac’s  equation 


dV+ 

~df 


(t,  x,  y )  +  min  max  [J (t,  u,  v,  x ,  y)  +  AtuvV+(t ,  x,  y)] , 

V  U 


V+(T,x,y)  =  VT(x,y), 


(55) 


and  V-(t,x,y)  of  the  lower  Isaac’s  equation  (with  min  and  max  placed  in  a  different  order). 
Similar  equations  are  satisfied  by  the  values  of  stochastic  games  V±(t,x,y )  (see  e.g.  [?]).  Now, 
if  V*  is  a  solution  to  the  Cauchy  problem  (55)  and  there  exist  Lipschitz  continuous  functions 
v*(t,x,y )  and  u*(t,v,x,y)  such  that 


u*(t,  v,  x,  y)  e  argmax[J(t,u,v,x,y)  +  AttU>vV*(t,x,y)\, 
v*(t,x,y)  G  argmin  max[J(t,u,v,x,y)  +  AtuvV*(t,x,y)], 

V 

then  V*  is  a  saddle  point  for  the  differential  game  T+(T)  giving  the  information  advantage  to 
maximizing  player  I.  Analogously  to  Theorem  2.2  (ii),  we  can  conclude  by  Theorem  2.3  that 
the  policies  v*(t,x,y)  and  u*(t,v,x,y )  represent  e-equilibria  for  the  corresponding  stochastic 
game  T+(T,  h). 

In  a  slightly  different  setting  one  can  assume  that  changes  in  a  competitive  control  process 
occur  as  a  result  of  group  interactions,  and  are  not  determined  just  by  the  overall  mean  field 
distribution.  Let  us  discuss  a  simple  situation  with  binary  interaction.  Assume  we  have  two 
groups  of  d  states  (of  objects  or  agents)  controlled  by  players  I  and  II  respectively.  Suppose  now 
that  any  particle  from  a  state  %  of  the  first  group  can  interact  with  any  particle  from  a  state  j 
of  the  second  group  (binary  interaction)  producing  changes  %  to  /  and  j  to  r  with  certain  rates 
Ql£(t,u,v)  that  may  depend  on  controls  u  and  v  of  the  players.  Assuming,  as  usual,  that  our 
particles  are  indistinguishable  (any  particle  from  a  state  is  selected  for  interaction  with  equal 
probability),  leads  to  the  process,  generated  by  the  operators 


N  M 

=  E  |jyj, 

Again  let  us  assume  for  simplicity  that  \M\  =  |IVj  and  define  h  —  1/\N\  —  1/\M\.  To  get  a 
reasonable  scaling  limit,  it  is  necessary  to  scale  time  by  factor  h  leading  to  the  generators 


which,  for  x  =  hN,  y  =  hM  and  h  — >  0,  tends  to 


A  t,u(t),v(t)f(x^y)  =  xiV jQijit’nit)’ *>(*)’ x>v) 


i,j,l,r=l 


df_  +  df_  _  df_  _  df_ 

dxi  dijr  dxi  dyj 


(x,y).  (57) 


The  corresponding  kinetic  equations  (characteristics  of  this  first  order  partial  differential  oper¬ 
ator)  have  the  form 


d 

Xk  =  yi  [■ xiQij(tiu(t)iv(t ))  -  xkQkj(t,u(t),v(t))]  , 

i,j,r=  1 
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d 

yk= 


yjQlij(t,u(t),v(t ))  -  ykQ%(t,u(t),v(t)) 


As  in  the  previous  section,  we  are  interested  in  the  zero-sum  stochastic  game,  which  will 
again  be  denoted  by  F(T,h),  with  the  dynamics  specihed  by  generator  (56)  and  with  the 
objective  of  the  player  /  (controlling  Q  via  u)  to  maximize  the  payoff  of  the  same  type  (50), 
and  in  an  approximation  of  this  game  by  the  limiting  deterministic  zero-sum  differential  game 
T(T),  defined  by  the  payoff  (51)  of  player  /. 


Theorem  2.4  Assume  that  Q,J  depend  continuously  on  t,u,v  and  Q,J,Vt  £  C1,a(fEd),  a  G 
(0,1],  as  functions  of  x,  with  the  norms  bounded  uniformly  in  t,u,v.  Then  the  same  estimate 
(54)  holds  for  the  difference  of  upper  and  lower  values  of  limiting  and  approximating  games. 


The  theory  was  also  partially  extended  to  the  case  of  K  players. 
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